ml fairness
Consensus and Subjectivity of Skin Tone Annotation for ML Fairness
Understanding different human attributes and how they affect model behavior may become a standard need for all model creation and usage, from traditional computer vision tasks to the newest multimodal generative AI systems. In computer vision specifically, we have relied on datasets augmented with perceived attribute signals (eg, gender presentation, skin tone, and age) and benchmarks enabled by these datasets. Typically labels for these tasks come from human annotators. However, annotating attribute signals, especially skin tone, is a difficult and subjective task. Perceived skin tone is affected by technical factors, like lighting conditions, and social factors that shape an annotator's lived experience.This paper examines the subjectivity of skin tone annotation through a series of annotation experiments using the Monk Skin Tone (MST) scale~\cite{Monk2022Monk}, a small pool of professional photographers, and a much larger pool of trained crowdsourced annotators. Along with this study we release the Monk Skin Tone Examples (MST-E) dataset, containing 1515 images and 31 videos spread across the full MST scale. MST-E is designed to help train human annotators to annotate MST effectively.Our study shows that annotators can reliably annotate skin tone in a way that aligns with an expert in the MST scale, even under challenging environmental conditions. We also find evidence that annotators from different geographic regions rely on different mental models of MST categories resulting in annotations that systematically vary across regions. Given this, we advise practitioners to use a diverse set of annotators and a higher replication count for each image when annotating skin tone for fairness research.
Machine Learning Fairness for Depression Detection using EEG Data
Kwok, Angus Man Ho, Cheong, Jiaee, Kalkan, Sinan, Gunes, Hatice
This paper presents the very first attempt to evaluate machine learning fairness for depression detection using electroencephalogram (EEG) data. We conduct experiments using different deep learning architectures such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks across three EEG datasets: Mumtaz, MODMA and Rest. We employ five different bias mitigation strategies at the pre-, in- and post-processing stages and evaluate their effectiveness. Our experimental results show that bias exists in existing EEG datasets and algorithms for depression detection, and different bias mitigation methods address bias at different levels across different fairness measures.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Consensus and Subjectivity of Skin Tone Annotation for ML Fairness
Understanding different human attributes and how they affect model behavior may become a standard need for all model creation and usage, from traditional computer vision tasks to the newest multimodal generative AI systems. In computer vision specifically, we have relied on datasets augmented with perceived attribute signals (eg, gender presentation, skin tone, and age) and benchmarks enabled by these datasets. Typically labels for these tasks come from human annotators. However, annotating attribute signals, especially skin tone, is a difficult and subjective task. Perceived skin tone is affected by technical factors, like lighting conditions, and social factors that shape an annotator's lived experience.This paper examines the subjectivity of skin tone annotation through a series of annotation experiments using the Monk Skin Tone (MST) scale \cite{Monk2022Monk}, a small pool of professional photographers, and a much larger pool of trained crowdsourced annotators.
Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits
Deng, Wesley Hanwen, Nagireddy, Manish, Lee, Michelle Seng Ah, Singh, Jatinder, Wu, Zhiwei Steven, Holstein, Kenneth, Zhu, Haiyi
Recent years have seen the development of many open-source ML fairness toolkits aimed at helping ML practitioners assess and address unfairness in their systems. However, there has been little research investigating how ML practitioners actually use these toolkits in practice. In this paper, we conducted the first in-depth empirical exploration of how industry practitioners (try to) work with existing fairness toolkits. In particular, we conducted think-aloud interviews to understand how participants learn about and use fairness toolkits, and explored the generality of our findings through an anonymous online survey. We identified several opportunities for fairness toolkits to better address practitioner needs and scaffold them in using toolkits effectively and responsibly. Based on these findings, we highlight implications for the design of future open-source fairness toolkits that can support practitioners in better contextualizing, communicating, and collaborating around ML fairness efforts.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Austria > Vienna (0.14)
- (10 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- Government (0.93)
- (2 more...)
Rethinking Fairness: An Interdisciplinary Survey of Critiques of Hegemonic ML Fairness Approaches
This survey article assesses and compares existing critiques of current fairness-enhancing technical interventions in machine learning (ML) that draw from a range of non-computing disciplines, including philosophy, feminist studies, critical race and ethnic studies, legal studies, anthropology, and science and technology studies. It bridges epistemic divides in order to offer an interdisciplinary understanding of the possibilities and limits of hegemonic computational approaches to ML fairness for producing just outcomes for society's most marginalized. The article is organized according to nine major themes of critique wherein these different fields intersect: 1) how "fairness" in AI fairness research gets defined; 2) how problems for AI systems to address get formulated; 3) the impacts of abstraction on how AI tools function and its propensity to lead to technological solutionism; 4) how racial classification operates within AI fairness research; 5) the use of AI fairness measures to avoid regulation and engage in ethics washing; 6) an absence of participatory design and democratic deliberation in AI fairness considerations; 7) data collection practices that entrench "bias," are non-consensual, and lack transparency; 8) the predatory inclusion of marginalized groups into AI systems; and 9) a lack of engagement with AI's long-term social and ethical outcomes. Drawing from these critiques, the article concludes by imagining future ML fairness research directions that actively disrupt entrenched power dynamics and structural injustices in society.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (19 more...)
- Research Report (1.00)
- Overview (1.00)
- Summary/Review (0.93)
A Look at Google's Efforts to Earn Public Trust Through ML Fairness and Responsible AI
For years, we've been hearing about major AI initiatives in international enterprises. Companies afraid of being left behind in the AI revolution pushed its implementation a whopping 270 percent from 2015 to 2019, according to Gartner report that surveyed more than 3,000 executives in 89 countries. Along with industry, AI is also enabling our modern smart homes, and has even found its way into gaming and leisure activities. AI's increasing presence has attracted no small amount of criticism, and often with good reason. Last year, an AI-powered "DeepNude" web project that enabled users to remove peoples' clothing in images (trained mostly on women) drew sharp criticism and was taken down by developers. A few months ago "Genderify," an AI-powered tool designed to identify a person's gender by analyzing their name, username or email address, triggered a backlash on social media and was also shut down.
50 Years of Test (Un)fairness: Lessons for Machine Learning
Hutchinson, Ben, Mitchell, Margaret
Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois (0.04)
- Oceania > New Zealand (0.04)
- Government (1.00)
- Education > Assessment & Standards (0.70)
- Law > Civil Rights & Constitutional Law (0.68)
- Law > Labor & Employment Law (0.46)